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Abstract 

In this paper we address the problem of finding the sensing capacity of sensor networks for a class of 
linear observation models and a fixed SNR regime. Sensing capacity is defined as the maximum number 
of signal dimensions reliably identified per sensor observation. In this context sparsity of the phenomena 
is a key feature that determines sensing capacity. Precluding the SNR of the environment the effect of 
sparsity on the number of measurements required for accurate reconstruction of a sparse phenomena has 
been widely dealt with under compressed sensing. Nevertheless the development there was motivated 



^T) . from an algorithmic perspective. In this paper our aim is to derive these bounds in an information theoretic 
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of error in reconstruction subject to an arbitrary distortion criteria. Using these lower bounds to the 
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set-up and thus provide algorithm independent conditions for reliable reconstruction of sparse signals. 
In this direction we first generalize the Fano's inequality and provide lower bounds to the probability 



probability of error, we derive upper bounds to sensing capacity and show that for fixed SNR regime 



sensing capacity goes down to zero as sparsity goes down to zero. This means that disproportionately 
more sensors are required to monitor very sparse events. We derive lower bounds to sensing capacity 



(achievable) via deriving upper bounds to the probability of error via adaptation to a max-likelihood 
detection set-up under a given distortion criteria. These lower bounds to sensing capacity exhibit similar 
behavior though there is an SNR gap in the upper and lower bounds. Subsequently, we show the effect 
of correlation in sensing across sensors and across sensing modalities on sensing capacity for various 
degrees and models of correlation. Our next main contribution is that we show the effect of sensing 
diversity on sensing capacity, an effect that has not been considered before. Sensing diversity is related 
to the effective coverage of a sensor with respect to the field. In this direction we show the following 
results (a) Sensing capacity goes down as sensing diversity per sensor goes down; (b) Random sampling 
(coverage) of the field by sensors is better than contiguous location sampling (coverage). In essence the 
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bounds and the results presented in this paper serve as guidelines for designing efficient sensor network 
architectures. 

I. Introduction 

In this paper we study fundamental limits to the performance of sensor networks for a class of 
linear sensing models under a fixed SNR regime. Fixed SNR is an important and necessary ingredient 
for sensor network applications where the observations are inevitably corrupted by external noise and 
clutter. In addition we are motivated by sensor network applications where the underlying phenomena 
exhibits sparsity. Sparsity is manifested in many applications for which sensor networks are deployed, 
e.g. localization of few targets in a large region, search for targets from among a large number of sites 
e.g. land mine detection, estimation of temperature variation for which few spline coefficients may suffice 
to represent the field , i.e. phenomena is sparse under a suitable transformation. More recent applications 
such as that considered in [1] also involve imaging a sparse scattering medium. 

The motivation for considering linear sensing models comes from the fact that in most cases the 
observation at a sensor is a superposition of signals that emanate from different sources, locations etc. 
For e.g., in seismic and underground borehole sonic applications, each sensor receives signals that is a 
superposition of signals arriving from various point/extended sources located at different places. In radar 
applications [1], [2], under a far field assumption the observation system is linear and can be expressed 
as a matrix of steering vectors. In this case the directions becomes the variable space and one looks for 
strategies to optimally search using many such radars. Statistical modulation of gain factors in different 
directions is feasible in these scenarios and is usually done to control the statistics of backscattered data. 
In other scenarios the scattering medium itself induces random gain factors in different directions. 

In relation to signal sparsity compressive sampling, [3], [4] has shown to be very promising in terms of 
acquiring minimal information, which is expressed as minimal number of random projections, that suffices 
for adequate reconstruction of sparse signals. Thus in this case too, the observation model is linear. In 
[5] this set-up was used in a sensor network application for realizing efficient sensing and information 
distribution system by combining with ideas from linear network coding. Also it was used in [6] to 
build a wireless sensor network architecture using a distributed source-channel matched communication 
scheme. 

For applications related to wireless sensor networks where power limited sensors are deployed, it 
becomes necessary to compress the data at each sensor. For e.g. consider a parking surveillance system 
where a network of wireless low resolution cameras are deployed, [7]. With each camera taking several 
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Fig. 1. A schematic of I-Park: a parking lot monitoring system. 



snapshots in space and transmitting all of them to a base station will overwhelm the wireless link to 
the base station. Instead transmission overhead is significantly reduced by sending a weighted sum of 
the observations. An illustration is shown in figure [TJ A similar set-up was also considered in [8] for a 
robotic exploration scenario. 

Motivated by the scenarios considered above we start with sensing (observation) models where at a 
sensor the information about the signal is acquired as a projection of the signal onto a weight vector. 
Under this class of observation model, the sensing model is linear and is essentially a matrix, G £ R mxn 
chosen from some appropriate class particular to the application. In this work we consider a fixed SNR 
model (see also [9]) where the observations at m sensors for the signal X £ X n are given by, 

Y = VSNR GX + N (1) 

where each row of the matrix G is restricted to have a unit £2 norm and where N is the noise vector 
with unit noise power in each dimension. It is important to consider fixed SNR scenario particularly for 
applications related to sensor networks. Practically each sensor is power limited. In an active sensing 
scenario the sensors distribute this power to sense different modalities, or to look (beamform) in various 
directions. Thus we restrict the £2 norm of each row of G to be unity and then scale the system model 
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appropriately by SNR. For a networked setting we assume that the observations made at the sensors 
are available for processing at a centralized location or node. In case when this is infeasible or costly, 
information can be exchanged or aggregated at each sensor using distributed consensus type algorithms, 
such as that studied in [10]. 

In order utilize the information theoretic ideas and tools, we adopt a Bayesian perspective and assume 
a prior distribution on X. Another motivation for considering a Bayesian set-up is that one can potentially 
model classification/detection scenarios where prior information is usually available and is useful. Note 
that under some technical conditions it can be shown that a lower bound to the Bayesian error is also 
lower bound to worst case probability of error for the parametric set-up. Therefore the lower bounds 
presented in this paper also provide lower bounds to the parameter estimation problem. 

In this paper we capture the system performance via evaluating asymptotic upper and lower bounds 
to the ratio C(do) = ^ such that reconstruction to within a distortion level do is feasible. We call the 
ratio C(do) as sensing capacity : the number of signal dimensions reliably identified per projection 
(sensor). This term was coined in [11] in the context of sensor networks for discrete applications. 
Alternatively, bounds to C (do) can be interpreted as providing scaling laws for the minimal number 
of sensors/projections required for reliable monitoring/signal reconstruction. 

For a signal sparsity level of k, a different ratio of ^ also seems to be a reasonable choice, but in most 
cases k is unknown and needs to be determined, e.g., target density, or sparsest signal reconstruction. 
Here it is important to penalize false alarms, misclassification costs. Furthermore, n and m are known 
and part of the problem specification, while signal complexity is governed by k, and one of our goals 
is to understand performance as a function of signal complexity. In this paper we show that sensing 
capacity C(do) is also a function of signal sparsity apart from SNR. 

The upper bounds to C(do) are derived via finding lower bounds to the probability of error in recon- 
struction subject to a distortion criteria, that apply to any algorithm used for reconstruction. The achievable 
(lower) bounds to C(do) are derived via upper bounding the probability of error in a max-likelihood 
detection set-up over the set of rate distortion quantization points. Since most of the development for 
these classes of problems has been algorithmic, [3], [9], our motivation for the above development is 
driven by the need to find fundamental algorithm independent bounds for these classes of problems. In 
particular, under an i.i.d model on the components of X that models a priori information, e.g. sparsity 
of X, and letting X(Y) denote the reconstruction of X from Y, then we show that, 
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p flirts V x . , \ . i?x(do)-^(do,n)-^/(X;Y|G) 
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for some appropriate distortion measure (£(.,.) and where Rx{do) is the corresponding scalar rate 
distortion function; K(n,do) is bounded by a constant and it depends on the number of neighbors of a 
quantization point in an optimal n— dimensional rate distortion mapping. 

Next, we consider the effect of structure of G on the performance. Using the result on the lower bound 
on the probability of error given by equation (0, a necessary condition is immediately identified in order 
that the reconstruction to within an average distortion level do is feasible, which is, Rx(do) — K(n, do) < 

— J(X;Y|G). For a fixed prior on X the performance is then determined by the mutual information 

n 

term that in turn depends on G. This motivates us to consider the effect of the structure of G on the 
performance and via evaluation of 7(X; Y|G) for various ensembles of G we quantify the performance 
of many different scenarios that restrict the choice of G for sensing. Under the case when G is chosen 
independently of X and randomly from an ensemble of matrices (to be specified later in the problem 
set-up), we have 



I(X;Y,G) = /(X;G)+/(X;Y|G) (3) 

=o 

= J(X;Y)+J(X;G|Y) (4) 

=>I(X;Y|G) = 7(X;Y) + /(X;G|Y) (5) 

This way of expanding allow us to isolate the effect of structure of the sensing matrix G on the 
performance which in principle influences bounds on C(do) through the change in mutual information 
as captured via the equations [3]|5] and as applied to satisfy the necessary conditions prescribed by the 
lower bound in equation ©. 

Using the above idea, in this paper we will show the effect of sensing diversity on the performance, 
a concept which is explained next. Under the sensing model as prescribed above, at each sensor one 
can relate each component of the corresponding projection vector as contributing towards diversity in 
sensing. The total number of non-zero components in the projection vector is called sensing diversity. 
This terminology is analogous to that used in MIMO systems in the context of communications. As will 
be shown later on that loss in sensing capacity is not very significant at reasonable levels of sensing 
diversity (with randomization in sampling per sensor). In fact there is a saturation effect that comes into 
play, which implies that most of the gains can be obtained at diversity factor close to 0.5. Now if one 
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considers the noiseless case, i.e. Y = GX, then it was shown in [3] that for some m and for some 
sparsity A; as a function of n and the coherence of the sensing matrix, an t\ optimization problem : 

min | |X| |i 

subject to : Y = GX, X > 

yields exact solution. To this end note that if G is sparse then solving the above system is computa- 
tionally faster as is shown in [12]. 

There are other types of modalities that arise in the context of resource constrained sensor networks. 
As an example consider the application in [7] where each camera may be physically restricted to sample 
contiguous locations in space or under limited memory it is restricted to sample few locations, possibly 
at random. This motivates us to consider other structures on G under such modalities of operation. In this 
paper we will contrast random sampling and contiguous sampling and show that random sampling is better 
than contiguous sampling. In such scenarios it becomes important to address a coverage question and in 
some cases may lead to a poor performance. In highly resource constrained scenarios randomization in 
elements of G is not feasible. In this direction we also consider an ensemble of {0, 1} matrices, with 
and without randomization in the locations of non-zero entries in each row. To facilitate the reading of 
the paper we itemize the organization as follows. 

1 . We present the problem set-up in section [TT] where we make precise the signal models and the 
ensembles of sensing matrices that will be considered in relation to different sensor networking 
scenarios. 

2. In section [III] we will present the lower bounds to the probability of error in reconstruction subject 
to an average distortion criteria. The development is fairly general and is self-contained. 

3. In section [TV] we will present a constructive upper bound to the probability of error in reconstruction 
subject to an average £2 distortion criteria. The development there is particular to the fixed SNR 
linear sensing model that is the subject of the present paper, though the ideas are in general applicable 
to other sensing models and to other classes of distortion measures. 

4. Once we establish the upper and lower bounds, we will use the results to obtain upper and lower 
bounds to sensing capacity for the fixed SNR linear sensing models, in sections [V] and [VT] In these 
sections we will consider the full diversity Gaussian ensemble for sensing matrix. The motivation 
to consider this model is that the mutual information and moment generating functions are easier to 
evaluate for the Gaussian ensemble. This is thus useful to gain initial insights into the tradeoffs of 
signal sparsity and SNR. 
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5. Since the bounds to sensing capacity can be interpreted as providing bounds for number of projec- 
tions/sensors for reliable monitoring, in section [Vnl we will compare the scaling implied by bounds 
to sensing capacity to that obtained in [9] in the context of complexity penalized regularization 
framework. 

6. In section IVIIII we consider the effect of the structure of the sensing matrix G on sensing capacity. 
The section is divided into several subsections. We begin by considering the effect of sensing diversity 
on sensing capacity. Following that we consider the effect of correlation in the columns of G on 
achievable sensing capacity. Then we consider a very general case of a deterministic sensing matrix 
and via upper bounding the mutual information we comment on the performance of various types 
of sensing architectures of interest. 

7. In section |IX] we consider the {0,1} ensemble for sensing matrices and provide upper bounds to 
sensing capacity for various modalities in sensing. 

8. In section [X] we give an example of how our methods can be extended to handle cases when one is 
interested in reconstruction of functions of X rather than X itself. In this direction we will consider 
the case of recovery of sign patterns of X. 

II. Problem Set-up 

Assume that the underlying signal X lies in an n-dimensional space X n , where X can be discrete or 
continuous. Discrete X models scenarios of detection or classification and continuous X models scenarios 
of estimation. 

a) Fixed SNR model: : The observation model for the sensors is a linear observation model and is 
given by, 

Y = VSNR GX + N (6) 

which is the fixed SNR model as described in the introduction. The matrix G G R mxn is a random 
matrix selected from an ensemble which we will state subsequently. For all m, n each row of G is 
restricted to have a unit norm. The noise vector N is i.i.d. Gaussian unit variance in each dimension. 

A. Discussion about fixed SNR model 

At this point it is important to bring out an important distinction of the assumption and subsequently 
analysis of a fixed SNR model in contrast to similar scenarios considered but in albeit high SNR setting. 
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The observation model of equation Q] studied in this paper is related to a class of problems that have 
been central in statistics. In particular it is related to the problem of regression for model order selection. 
In this context the subsets of columns of the sensing matrix G form a model for signal representation 
which needs to be estimated from the given set of observations. The nature selects this subset in a 
weighted/non-weighted way as modeled by X. The task is then to estimate this model order and thus 
X. In other words estimate of X in most cases is also linked to the estimate of the model order under 
some mild assumptions on G. Several representative papers in this direction are [13], [14], [15] that 
consider the performance of several (signal) complexity penalized estimators in both parametric and non- 
parametric framework. One of the key differences to note here is that the analysis of these algorithms 
is done for the case when SNR — ► oo, i.e. in the limit of high SNR which is reflected by taking the 
additive noise variance to go to zero or not considering the noise at all. However SNR is an important 
and necessary ingredient for applications related to sensor networks and therefore we will not pursue a 
high SNR development here. Nevertheless the results obtained are directly applicable to such scenarios. 

In the next section we will first outline prior distribution(s) on X, that reflect the sparsity of the signal 
X and the model for realizing sensing diversity in the sensing matrix G. Then we will outline the choices 
of ensembles for the sensing matrix G. In the following J\f(m, a 2 ) denotes the Gaussian distribution with 
mean m and variance a 2 . 

B. Generative models of signal sparsity and sensing diversity 

b) Signal sparsity: In a Bayesian set-up we model the sparsity of the phenomena by assuming 
a mixture distribution on the signals X. In particular the n dimensional vector X = X\,...,X n is a 
sequence drawn i.i.d from a mixture distribution 

P x = aA/"(mi,of ) + (1 - a)J\f(m , erg) 
where a < \. In this paper we consider two cases. 

1) Discrete Case: m\ = 1 and mo = and o~\ = <7 = 0. This means that X is a Bernoulli (a) 
sequence. This models the discrete case for addressing problems of target localization, search, etc. 

2) Continuous Case: mi = m-2 = but a\ = 1 and Uq = 0. This models the continuous case. 

In this context we call a the sparsity ratio which is held fixed for all values of n. Under the above 
model, on an average the signal will be k sparse where k = an. Note that k — > oo as n —* oo. 
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c) Sensing diversity and ensemble for G: In connection to the model for diversity, the sensing 
matrix G is random matrix such that for each row i, Gy, j = 1,2, ..,n are distributed i.i.d according to 
a mixture distribution, (1 — (3)J\f(mo, ctq) + (3J\f(mi,af). We consider three cases: 

1) Gaussian ensemble: mi = m = and a± = 1; <7o = 

2) Deterministic G: The matrix G is deterministic. 

3) {0, i} mxn ensemble: mi = l;m = and a x = cr = 0. 

The matrix is then normalized so that each row has a unit £2 norm. In this context we call f3 as the 
(sensing) diversity ratio. Under the above model, on an average each sensor will have a diversity of 
I = (3n. Note that / — ► 00 as n — > 00. Given the set-up as described above the problem is to find upper 
and lower bounds to 



where X(Y) is the reconstruction of X from observation Y and where d(X, X(Y) = Y^=i d(Xi,Xi(Y)) 
for some distortion measure d(., .) defined on X x X. In this paper we will consider Hamming distortion 
measure for discrete X and squared distortion measure for the continuous X. Under this set-up we exhibit 
the following main results: 

1) Sensing capacity C(do) is also a function of SNR, signal sparsity and sensing diversity. 

2) For a fixed SNR sensing capacity goes to zero as sparsity goes to zero. 

3) Low diversity implies low sensing capacity. 

4) Correlations across the columns and across the rows of G leads to decrease in sensing capacity. 

5) For the {0, 1} ensemble for sensing matrices, sensing capacity for random sampling is higher than 
for contiguous sampling. 

In the next section we will provide asymptotic lower bounds on the probability of error in reconstruction 
subject to a distortion criteria. Following that we will provide a constructive upper bound to the probability 
of error. We will then use these results to evaluate upper and lower bounds to sensing capacity. In the 
following we will use X and X n interchangeably. 

III. Bounds to the performance of estimation algorithms: lower bounds 

Lemma 3.1: Given observation(s) Y for the sequence X n = {X\, X n } of random variables 
drawn i.i.d. according to Px- Let X n (Y) be the reconstruction of X n from Y. Also is given a distortion 




measure d(X n ,X n (Y)) = E?=id( x i, x i( Y )) then > 
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Pr y~d(X n (Y),X n ) > d J > R x (d ) ^ ' 

where if (do, n) is bounded by a constant and where Rx{do) is the corresponding (scalar) rate distortion 
function for X. 

Proof: See Appendix. ■ 
Essentially, K(n, do) = i x log(jj neighbors of a quantization point in an optimal n-dimensional rate- 
distortion mapping). NOTE: The assumption of a scalar valued process in lemma 13.11 is taken for the 
sake of simplicity. The results are easily generalizable and can be extended to the case of vector valued 
processes. 

For the simpler case of discrete parameter space, the lower bound to the minimax error in a parameter 
estimation framework is related to the Bayesian error as follows, 



min maxPr f -d(X,±(Y)) > d ) = min max V P(X)Pr f -d(X, X(Y~)) > d o I 
x(y)XgG \n J ±(Y)PBeP e ^ e \n J 

> min V vr(X)Pr ( -d(X,±(Y)) > do] 



(7) 
(8) 



where Q is the parameter space and Vq is the class of probability measures over G and n G V is any 
particular distribution. The above result holds true for the case of continuous parameter space under some 
mild technical conditions. Thus a lower bound to the probability of error as derived in this paper also 
puts a lower bound on the probability of error for the parametric set-up. In our set-up we will choose n 
as a probability distribution that appropriately models the a priori information on X, e.g. signal sparsity. 
For modeling simple priors such as sparsity on X one can choose distributions that asymptotically put 
most of the mass uniformly over the relevant subset of 6 and is a key ingredient in realization of the 
lower bound on probability of error derived in this paper. 

We have the following corollary that follows from lemma 13.11 

Corollary 3.1: Let X n = X±,..,X n be an i.i.d. sequence where each X{ is drawn according to 
some distribution Px(x) and X n G X n , where | | is finite. Given observation Y about X n we have, 
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A. Tighter bounds for discrete X under hamming distortion 

The results in the previous section can be stated for any finite n without resorting to the use of AEP for 
the case of discrete alphabets, with hamming distortion as the distortion measure and for certain values 
of the average distortion constraint do- We have the following lemma. 

Lemma 3.2: Given observation(s) Y for the sequence X n = {Xi, X n } of random variables 
drawn i.i.d. according to Px- Then for hamming under distortion measure djj(., .), for Xj G X, \X\ < oo 
and for distortion levels, do < {\X\ — 1) minx^x Px, 

P r (^ H (x»,x»(Y) >,„))> Y)-l 



nlog(\X\) -n(ft(d„)+<iolog(|^| - 1)) 
Proof: See Appendix. 



B. Comment on the proof technique 

The proof of lemma [3Tl closely follows the proof of Fano's inequality [16], where we start with a 
distortion error event based on ^d(X(Y),X) > do and then evaluate conditional entropy of a rate- 
distortion mapping conditioned on the error event and the observation Y. To bound K(n, do), we use 
results in [17] for the case of squared distortion measure. 

In relation to the lower bounds presented in this paper for the probability of reconstruction subject to an 
average distortion level one such development was considered in [18] in the context of a non-parametric 
regression type problem. Let 9 be an element of the metric space (d, ©). Then given {Yi,Gi}™ =1 for 
some random or non-random vectors Gj E W 1 and Yi being the responses to these vectors under 9. 
Also is given the set of conditional pdfs given by Pe(G i )(Xi) where the notation means that that the pdfs 
are parametrized by 9{Gi). The task is to find a lower bound on the minimax reconstruction distortion 
under measure d, in reconstruction of 9 given Y and G. In our case one can identify X = 9 and 
G = X n with squared metric d. For such a set-up lower bounds on the asymptotic minimax expected 
distortion in reconstruction (not the probability of such an event) was derived in [18] using a variation 
of Fano's bound (see [19]) under a suitable choice of worst case quantization for the parameter space 
= {space of q-smooth functions in [0, 1]"} meterized with £ r , 1 < r < oo distance. 

Our derivation has a flavor of this method in terms of identifying the right quantization, namely the 
rate distortion quantization for a given level of average distortion in a Bayesian setting. Although we 
evaluate the lower bounds to the probability of error and not the expected distortion itself, the lower 
bound on the expected distortion in reconstruction follows immediately. Moreover our method works for 



February 1, 2008 



DRAFT 



12 



any distortion metric d, though in this paper we will restrict ourselves to cases of interest particular to 
sensor networks applications. 

IV. Constructive upper bound to the probability of error 

In this section we will provide a constructive upper bound to the probability of error in reconstruction 
subject to an average squared distortion level. Unlike the lower bounds in this section we will provide 
upper bounds for the particular observation model of equation ©. This could potentially be generalized 
but we will keep our focus on the problem at hand. 

To this end, given e > and n, assume that we are given the functional mapping f(X n ) (or /(X)) that 
corresponds to the minimal cover at average distortion level do as given by lemma 111.21 Upon receiving 
the observation Y the aim is to map it to the index corresponding index /(X), i.e. we want to detect 
which distortion ball the true signal belongs to. Clearly if X is not typical there is an error. From lemma 
1 1 1 - 1 L the probability of this event can be bounded by an arbitrary 5 > for a large enough n. So we 
will not worry about this a-typical event in the following. 

Since all the sequences in the typical set are equiprobable, we covert the problem to a max-likelihood 
detection set-up over the set of rate-distortion quantization points given by the minimal cover as follows. 
Given G we and the rate distortion points corresponding to the functional mapping f(X n ), we enumerate 
the set of points, GZf G R m . Then given the observation Y we map Y to the nearest point (in W 71 ) 
GZ™. Then we ask the following probability, 

Pr ( v / 5iVi?G/(X) -> VSNRGf(X')\G,X G B h X' G B 3 : ld set {Bi,B 3 ) > 2d ) 

that is, we are asking what is the probability that the in typical max-likelihood detection set-up we 
will map signals from distortion ball B{ to signals in distortion ball Bj that is at an average set distance 
> 2g?o from Sj, where d se t(Bi,Bj) = minxe8i,X'eB d(X, X'). For sake of brevity we denote the above 
probability via P e (pair) to reflect it as a pairwise error probability. Since the noise is additive Gaussian 
noise we have 

P e (pair) = Pr ^N T G(X - X') > ^VSNR\\G(X - X')|| 2 : X G Bi, X' G Bj 
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Since noise N is AWGN noise with unit variance in each dimension, its projection onto the unit vector 
| G(X-X')|| * s a ^ so Gaussian with unit variance. Thus we have 



P e (pair) = Pr(^N> ^ S ^ R \\G(X - X')|| : X G Bi, X' G Bjj 
By a standard approximation to the Q(.) (error) function, we have that, 

P e (/(X) -» /(X')|X G B^X' G B V G : ^ sei (^,^) > 2d ) < e X p|-^S^^^| 
In the worst case we have the following bound, 

P e f/(X) -> /(X')|X G iBi.X' G B i9 G : -d^B^B,) > 2d ) < exp (- min ^H G ( X ~ X 0II 2 1 

Now note that from above construction it implies that the average distortion in reconstruction of X 
is bounded by 2do if the distortion metric obeys triangle inequality. To evaluate the total probability of 
error we use the union bound to get, 



Pr ( Id(X,X(Y)) > 2d ) < exp (- min SNR W G ( X Z X ')H \ 2 n(R x (d B )-K(n, do) ) 

We will use this general form and apply it to particular cases of ensembles of the sensing matrix G. 
In the following sections we begin by providing upper and lower bounds to the sensing capacity for the 
Gaussian ensemble for full diversity. 

V. Sensing Capacity: Upper bounds, Gaussian ensemble 

A. Discrete X, full diversity, Gaussian ensemble 

For this case we have the following main lemma. 

Lemma 5.1: Given X G {0, l} n drawn Bernoulli (a, 1 — a) and G chosen from the Gaussian 
ensemble. Then, with the distortion measure as the hamming distortion, for a diversity ratio of j3 = 1 
and for do < the sensing capacity C is upper bounded by 



C(do) < 



\ log(l + aSNR) 



Rx{d ) 

Proof: From lemma 13.21 the probability of error is lower bounded by zero if the numerator in the 
lower bound is negative, this implies for any m, n that 
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Fig. 2. The plot of sparsity versus upper bounds to the sensing capacity for various SNRs for the binary case 
(X = {0, 1}) for zero Hamming distortion. 



i/(X;Y|G) 



Rx{d ) 

Since G is random we take expectation over G. It can be shown that the mutual information 



E G /(X n ;Y|G) < 

max Px:E;EX, 2 <c. i E G log det(I mxm + GXX T G T ) 
= Ei\ u ^\ m Y^hLi 5^°s(l + ^iCtSNR) where Aj are singular values of GG T . Since rows of G have a 
unit norm => Aj < 1 Vi. Hence E G /(X n ; Y|G) < f log(l + aSNR). Thus the result follows. ■ 



B. Continuous X, /m/Z diversity, Gaussian ensemble 

Lemma 5.2: Given XeM" drawn i.i.d. according to P x = a/V(0, 1) + (1 - a)jV(0 3 0) and G 
chosen from the Gaussian ensemble. Then, for squared distortion measure, for diversity ratio (3=1 and 
for do < §, the sensing capacity C(do) obeys, 



C(d ) < 



±log(l + aSNR) 



Proo/: From lemma [5H we have that E G /(X;Y|G) < §log(l + aSNR). In order that the 
probability of error be lower bounded by zero, from lemma 13.11 it follows that asymptotically 



n < E G /(X;Y|G) 

m ~ Rx(do) - K(do,n) 
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It can be shown that \K(do,n) — log 2| < e with e very small for large enough n, see e.g. [17]. The 
lemma then follows by plugging in the results from section IXI-CI ■ 
It can be easily seen that as a [ the sensing capacity goes to zero. We illustrate this by plotting the 
upper bounds in figure [2] for the discrete case. We will revisit this phenomena in section IVlTl in relation 
to the bounds derived in [5] in the context of compressed sensing. 

VI. Sensing Capacity: Lower bounds, Gaussian ensemble 

A. Discrete alphabet, full diversity 

The discrete X with hamming distortion is a special case where we can provide tighter upper bounds. 
The proof follows from the development in section JV] and identifying that for the discrete case one can 
choose the discrete set of points instead of the distortion balls. We have the following lemma. 

Lemma 6.1: Given X G X n with \X\ < oo, for /3 = 1 and G chosen from a Gaussian ensemble. 
Then for do < min x ^x Px(x), a sensing capacity of 



F(X)-d log|*-l|-d log^ 
is achievable in that the probability of error goes down to zero exponentially for choices of C = — = 
C(do) — Tj for any rj > 0. 
Proof: We have 

Pr (id(X,X(Y)) > dole) < exp |_ ^HG(X-XQ|p | 2 ^ W -n, io g |*-i h io g (;;) 

where we have applied the union bound to all the typical sequences that are outside the hamming 
distortion ball of radius do. Taking the expectation with respect to G we get, 

Pr (^(X,X(Y)) > do) < E G exp|- SAr ^ l|G( ^" X/)l12 } 2 ^W-n do log |^-l|-lo g ( n » Q ) 

Now note that since G is a Gaussian random matrix where each row has a unit £ 2 norm, ||G(X — 
ZXl' ) 1 1 2 = Y^=i I Ej=i — ^j)l 2 ^ s a sum °f m independent x 2 random variables with mean 



X — X || . Thus from the moment generating function of the x random variable we get that, 

m/2 

1 . . \ . / 1 \ ~nH(X)-nd log\X-l\-log( n " do ) 



Pr \-d{X. X( Y) ) > ,/ ! 1 . ,. v »iv-vi- I 2 
This implies, 



v 1 "•" 2n 
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Pr (id(X,X(Y)) > do) < 2-? lo S (l+^) 2 nH(X)- ndo ]og|*-l|-Iog( B »J 

Now note that for do < a, log ) > nefolog^- Then from above one can see that the probability 
of error goes down to zero if, 



„^ ilo g (i + ^o; 



m H(X)-d log\X-l\-d logj- o 
Thus a sensing capacity of 



IWl + SNRd - 

C(do) 2 



F(X)-41og|^-l|-d logi 
is achievable in that the probability of error goes down to zero exponentially for choices of C 
C(do) — Tj for any rj > 0. 



B. Continuous X, full diversity 

Lemma 6.2: [Weak Achievability] For X 6 M n and drawn i.i.d. according to P X {X), G chosen 
from the Gaussian ensemble and = 1, a sensing capacity of 



C(2d, 



oj 



i log(l + doSTVfl) 



Rx(do) - K(n,d ) 

is achievable in that the probability of error goes down to zero exponentially with n for C = ~ < 
C(2c?o) — e for some arbitrary e > 0. 

Proof: For this case we invoke the construction as outlined in section [IVJ From the results in that 
section we get that, 



Pr(±d(X,±(Y))>2d ) <expl- min gJVfl H G ( X Z gOg 1 2 n(R x (d )-K(n,d )) 

Note that the result is little weaker in that guarantees are only provided to reconstruction within 
do, but one can appropriately modify the rate distortion codebook to get the desired average distortion 
level. Proceeding as in the case of discrete X and , by taking the expectation over G and noting that 

mm XGBi,X'eBj ||X — X'|| 2 > 2ndo, we get that, 

Pr(id(X,X(Y))>2d ) S {TT^T' 2 2n(RM - KM)) 
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a = 0.3SNR = 10 



a = 0.3SNR = 10 



(a) 



- upper bound 
lower bound 





(b) 



Fig. 3. (a) Plots of upper and lower bounds to sensing capacity for the Gaussian mixture model, (b) Plots of upper and lower 
bounds for sensing capacity for the Bernoulli model. The distortion on the x-axis is mean squared distortion for the Gaussian 
case and hamming distortion for the Bernoulli case. Note that zero distortion achievable sensing capacity is zero and there is 
an SNR gap in the upper and lower bounds. 



This implies, 



Pr ( -d(X,±(Y)) > 2d } < ( ] ' 2< Rx ^~ K ^ d ^ 

\n V ' y n ~ u y - \\ + SNRd Q ) 

Pr ^id(X,X(Y)) > 2d^j < 2-f lo ^+SNRd ) 2 n(R x (d )-K(n,do)) 



This implies that for 



n |log(l + d SNR) 



< 



m R x (d ) - K(n,d ) 
the probability of error goes to zero exponentially. This means that a sensing capacity of 



C(2d 



| log(l + doSNR) 



Rx(do) - K(n,d ) 

is achievable in that the probability of error goes down to zero exponentially with n for C 
C(do) — rj for some arbitrary rj > 0. 



n < 



A plot of upper and lower bounds are shown in figure [3] 
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VII. Comparison with existing bounds 

Note that the results in this paper are stated for do < a for the discrete case and for do < ~ for the 
continuous case. This is because one must consider stricter average distortion measures as the phenomena 
becomes sparser. To bring out this point concretely and for purposes of comparison with existing bounds, 
we consider the result obtained in [5] based on optimal complexity regularized estimation framework. 
They show that the expected mean squared error in reconstruction is upper bounded by, 



E 



IX -XI 



n 



< C X C 2 



k log n 



m 



(9) 



where C\ ~ 1 and C 2 ~ 50(P + a) 2 {(1 + p) log 2 + 4}, under normalization of the signal and the 
noise power and p is the number of quantization levels, [9]. To this end consider an extremely sparse 
case, i.e., k = 1. Then the average distortion metric in equation |9j does not adequately capture the 
performance, as one can always declare all zeros to be the estimated vector and the distortion then is 
upper bounded by O(^). Consider the case when X is extremely sparse, i.e. a [ as ^. Then a right 
comparison is to evaluate the average distortion per number of non-zero elements, E 
Using this as the performance metric we have from equation |£l 



£lix-x| 



E 



— I|X-X|| 2 

an 



< Cl C 2 ^ (10) 
m 



When a is small then the average number of projections required such that the per non-zero element 
distortion is bounded by a constant, scales as O(nlogn). This is indeed consistent with our results, in 
that the Sensing Capacity goes down to zero as j^- t - 

X is sparse, i.e. a < 1 but not very small. From results on achievable sensing capacity we have that 

Pr (i||X-X| | 2 > do) < -flog(l + doSNR/2) + n(R x (d )-K(n,do)) 

In order to compare the results we fix, performance guarantee of Pr(d(X., X) > do) < e for a given 
e > 0, we have for the minimal number of projections required that, 

. 2 (log(l/e) + n(Rx(d ) - K(n, dp))) 

777 s> 

log(l + doSNR/2) 
from our results. From results in [9] it follows that, 
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Fig. 4. The difference in scaling of the number of projections with the sparsity rate from bounds derived from 
Sensing Capacity and from bounds obtained in [9]. Our bounds are sharper. 



m > 6i6 2 : 

For the special case of binary alphabet we have the following scaling orders for the number of 
projections in both cases, from achievable sensing capacity we have mi > 0(nH2(a)) and from results 
in [9] we have 7712 > O {an log n). A plot of these orders as a function of a for a fixed n is shown in 
figure, 01 

VIII. Effect of structure of G 

In this section we will show that effect of structure of G on sensing capacity. This section is divided 
into several subsections and the discussion is self-contained. In section IVIII-AI we will show that for 
the Gaussian ensemble, the sensing capacity reduces for when diversity is low. Following that in section 
IVIII-B I we will show the effect of correlation across columns in the sensing matrix for the Gaussian 
ensemble on achievable sensing capacity. In section IVIII-CI we will present a general result for a generic 
sensing matrix G which will subsequently be used to highlight the effect of structures such as that 
induced via random filtering using a FIR filter with/without downsampling as considered in [20]. 

A. Effect of sensing diversity, Gaussian ensemble 

In order to show the effect of sensing diversity we evaluate the mutual information Eg^(X;Y|G) 
using the intuition described in the introduction. To this end we have the following lemma. 
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Fig. 5. The gap between upper bounds to sensing capacity in very low diversity and full diversity for the binary 
alphabet case. Shown also is the Sensing Capacity as a function of diversity for fixed sparsity. Note the saturation 
effect with diversity ratio. 



Lemma 8.1: For a diversity ratio of (3, with I = [3n as the average diversity per sensor and an 
average sparsity level of k = an , we have 



in , 



E G /(X;Y|G) < -E i 



log 



SSNR 



i + i 



(ii) 



where the expectation is evaluated over the distribution 

Pr(j) 



fk\ (n—k\ 
\j) \ l-j ) 



(1) 

Proof: See Appendix. ■ 
In the above lemma j plays the role of number of overlaps between the projection vector and the 
sparse signal. As the diversity reduces this overlap reduces and the mutual information decreases. We 
will illustrate this by considering the extreme case when (3 [ with n as i. For this case we have, 

/(X;Y|G) 



<fE, 



log 



j SNR 



+ 1 



= f [(1 - a) \og{SNR ■ + 1) + a log{SNR + 1)] 
= 2^1og(l + SNR) 

The effect is illustrated in figure [5] Thus low sensing diversity implies low sensing capacity. 
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B. Effect of correlation in G on achievable sensing capacity 

In this section we will show that correlation in sensing matrix G reduces achievable capacity. Correla- 
tion in G can arise due to many physical reasons such as correlated scattering, correlation of gains across 
modalities in sensing which may arise due to the physical construction of the sensor. Naturally there can 
be direct relations between various phenomena that can lead to such correlation. This is captured by 
assuming that there is correlation across the columns of G. Consider the upper bound to the probability 
of error as derived in section HVl 

Pr (-d(X,XCY)) > 2d ) <exp(- min SNR W G ^ Z X ')H 2 \ 2 n( Rx (d )-K M ) 
In the above expression, the term 

n n 

SNR\\G(X - X')|| 2 = SNR^2\ Yl G ^ X i ~ X 'i)\ 2 

i=l j=l 

where Yll=i — X'-) for each i are independent Gaussian random variables with zero mean 

and variance given by- A T T l Q r . A where A is the vector A = X — X' and E(j i is the covariance matrix 
(symmetric and positive semi-definite) of the i-th row of G. By construction, we know that ^A T A > 2do 
and note that in the worst case, 

min A T S Gl A = A min A T A 

where A m i n is the minimum eigenvalue of the normalized covariance matrix Sg,- Proceeding in a 
manner similar to that in the proof of lemma 16.21 we have that, 



Pr (-d(X,X(Y)) > 2d ) < ( — ^r-, Y 2 2 n(R x (d )-K(n,do)) 

From the above expression one can see that achievable sensing capacity falls in general, since X m in < 1 
as compared to the case when the elements of G are uncorrected in which case A m ; n — 1 — A max . 

C. Deterministic G 

In this section we will consider deterministic matrices G and provide upper bounds to sensing capacity 
for the general case. To this end denote the rows of G as Gj, i = 1, 2, . . . , m. Let the cross-correlations 
of these rows be denoted as: 

_ GfG i+1 
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As before to ensure the SNR, to be fixed we impose Gf G« = 1 for all i. Then we have the following 
result: 

Lemma 8.2: For the generative models for the signal X as outlined in the problem set-up, an upper 
bound for the sensing capacity for a deterministic sensing matrix G 6 fl£ mx ™ j s gj ven by. 

m^i log (l + SNRa(l - n) + ^^(1 + aSNR(l - n ))) 

C ^ ^ E — 5 ( A ^ K( A\ " (12) 

^ Rx(do) - K(n,d ) 

Proof: We will evaluate I(X;Y|G) via the straightforward method, 

/(X; Y|G) = h(Y\G) - h(Y\G,X) 

Note that /»(Y|G,X) = h(N). Note that h(Y\G) < h(Y) < h(Y*) where Y* is a Gaussian random 
vector obtained via GX* where X* is now a Gaussian random vector with i.i.d components and with 
the same covariance as X under the generative model(s). We will now upper bound the entropy of Y 
via, 

m— 1 

h(Y) < MY*) < h(Y?) + £ h(Y* +1 | Y*) < h(Y{) + h(Y* +1 - m Y*) 

i=i 

where i]iY* is the best MMSE estimate for Y* +1 . The MMSE estimate of Y* +1 from Y* is given by, 

£ y , y . +i = naSNR and Ey; = aSNR + 1. The result then follows by evaluating the MMSE error 
given by, 

V(Y * y* ) 2_ V ( Y * naSNR ^ - 

^ v. i+1 aSNR+l I i y -«OJvn+l+ aSNR+1 Z aSNR+l 

= 1 + «5iV J R(l - n) + ^ggg (1 + (1 - n)aSNR) 
Plugging in the quantities the result follows. 

■ 

Let us see the implications of the above result for one particular type of sensing matrix architecture 
induced via a random filtering and downsampling, considered in [20]. The output of the filter of length 
L < n can be modeled via multiplication of X via a Toeplitz matrix (with a banded structure). The overlap 
between successive rows of the matrix G is L — 1 in this case implying a large cross correlation rj. From 
lemma [12] it follows that larger cross correlation in rows implies poor sensing capacity. Also note that 
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Fig. 6. Illustration of random sampling Vs contiguous sampling in a sensor network. This leads to different structures on the 
sensing matrix and that leads to different performance. 



for a filtering architecture one has to address a coverage issue wherein it is required that m > n — L + 1. 
This implies that L > n — m + f . Thus the filter length has to be sufficiently large which implies that 
cross-correlation is also large. 

Indeed randomizing each row will lead to low cross-correlation (in an expected sense) but the coverage 
issue still needs to be addressed. On the other hand one can subsample the output signal of length n—L+1 
by some factor so as to reduce the cross correlation yet ensuring coverage. In this case the matrix almost 
becomes like a upper triangular matrix and there is a significant loss of sensing diversity. A loose tradeoff 
between the filter-length L and the sampling factor d (say) immediately follows from lemma [12] where 

, u r L(l-d) 

the cross correlation changes according to rj = 

n 

IX. Upper bounds on Sensing Capacity for {0, 1} ensemble 

The main motivation for considering this ensemble comes from scenarios where randomization in the 
elements of G is not feasible, e.g. field estimation from smoothed data. In this case each sensor measures 
a superposition of the signals that are in the sensing range of the sensor. This leads us to consider other 
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types of modalities, e.g. contiguous sampling of X by each sensor Vs random sampling for < 1. An 
illustration of the two types of sampling is shown in figure [6] We reveal the following contrast for the 
two cases for same (3 < 1 

Lemma 9.1: Random Sampling: For the {0, 1} ensemble for sensing matrices consider the case 
when each row of G has fin ones randomly placed in n positions. Then for discrete X G {0, l} n drawn 
Bernoulli(a) and for do < a, 

C r and(do) < 



h 2 (a) - h 2 (d Q ) 

where H(.) is the discrete entropy function and where J is a random variable with distribution given by 

/an\ /n(l—a)\ 

pr( j = j) = Ki) )tr i} 

Kj3n) 

Proof: See Appendix. ■ 
Lemma 9.2: Contiguous Sampling: For the {0, 1} ensemble for sensing matrices consider the case 

where each row of G has f3n consecutive ones randomly placed with wrap around. Then for discrete 

X G {0, l} n drawn Bernoulli(a) and do < a, 

r <J\< h ^ a + l 3 ) 

^contg\ao) S 



h 2 {a) - h 2 (d ) 

Proof: See Appendix. ■ 
As seen the upper bound, C ran( i{do) > C con t g .(do)- Thus randomization in G performs better. The 
difference is shown in figure [7] for a low sparsity scenario. The proofs of the lemmas |9~T1 and 19721 follow 
from the upper bounds to the mutual information terms as provided in section IXIII and then applying the 
necessary conditions for the lower bound on the probability of error to be lower bounded by zero. 

X. Estimation of functions of X 

The analysis of lower bounds to the probability of error presented in this paper extend in a straight- 
forward way to estimation of functions of X. In this section we will consider one such scenario that has 
received attention in relation to problems arising in physics. The discussion below will reveal the power 
of the method presented in this work and it is easily capable of handling more complicated cases and 
scenarios, though the computation of the terms involved in the analysis may become hard. 
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Fig. 7. A comparison of the upper bounds to sensing capacity for the randomized sampling Vs contiguous sampling case. X 
is the Bernoulli model and the ensemble for G is the {0, 1} ensemble. We have selected the case of low sparsity in this case. 
Note that due to loose overbounding of mutual information (we basically got rid of noise) the upper bounds are greater than in 
the case of Gaussian ensemble. 



A. Detecting the sign pattern of X 

Of particular interest is to estimate the sign pattern of the underlying signal X. To this end define a 
new random variable U, via 



1 if Xi > 
Ui = < -1 if Xi < 
if Xi = 

The corresponding n dimensional extension and probability distribution on U is induced directly via 
Px- In such a case note that U — > X — > Y — > U(Y) forms a Markov chain. To this end consider an 
error event defined via, 



E 



1 ifU^U(Y) 
otherwise 



Then we have, 
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Thus we have 



H(U,E\Y) = H(E\Y) + H(U\E,Y) 

<i 

= H(XJ\Y)+H(E\U,Y) 

=o 



H(U\Y) < 1 + P e H(U\E = 1, Y) + (1 - P e )H(U\E = 0, Y) 

V V 

<nlog3 =0 



This implies, 



p > ^(U)-/(U;Y|G)-l 



nlog3 

In order to evaluate the I(U; Y|G) we note that J(U,X;Y|G) = J(X;Y|G). This follows from , 
J(U, X; Y|G) = H(U, X) - H(X, U| Y, G) = H(X) - H(X.\G, Y) - P(U|G, Y, X) = /(X; Y|G). 
Thus /(U; Y|G) = /(X; Y|G)— /(X; Y|G, U) and both these terms can be adequately bounded/evaluated. 

XI. Appendix 

A. Proof of lemma \3A\ 

Let X n = {X\, X n } be an i.i.d. sequence where each variable Xi is distributed according to a 
distribution Px defined on the alphabet X. Denote Px™ — {Px) n the n-dimensional distribution induced 
by Px- Let the space X n be equipped with a distance measure d(., .) with the distance in n dimensions 
given by d n (X n ,Z n ) = Ylk=i d ( X k, z k) for X n ,Z n G X n . Given e > 0, there exist a set of points 
{Z?,...,Z NcM }c X n such that, 

Px» I U Bi I > 1 - e (13) 

where Bj = {X n : ±d n (X n , Z?) < d }, i.e., the d balls around the set of points cover the space X n 
in probability exceeding 1 — e. 

Given such set of points there exists a function f(X n ) : X n -> si. Pr (±d n (X n , Zf) < d ) > 
1 — e. To this end, let Tp x „ denote the set of 5 - typical sequences in X n that are typical Px™, i.e. 

T Pxn = [x n : | - -\ogP{X n ) - H{X)\ < S 

n 
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where P{X n ) is the empirical distribution induced by the sequence X n . We have the following lemma 
from [21]. 

Lemma 11.1: For any rj > there exists an no such that for all n > no, such that 

Pr (x n : | - ~ \ogP{X n ) - H(X)\ < > 1 - rj 
In the following we choose rj = 5. Given that there is an algorithm X n (Y) that produces an estimate of 
X n given the observation Y. To this end define an error event on the algorithm as follows, 



lif ±d n (X n ,X n (Y))>do 
otherwise 



Define another event A n as follows 



1 if X n G T Pxn 
otherwise 

Note that since X n is drawn according to Px™ and given 5 > we choose no such that conditions of 
lemma ITTTTI are satisfied. In the following we choose n > uq(5). Then a priori, Pr(A n = 1) > (1 — 5). 
Now, consider the following expansion, 

H(f(X n ),E n ,A n \Y) 

= H(f(X n )\Y) + H(E n ,A n \f{X n ),Y) 

= H(E n ,A n \Y) + H(f(X n )\E n , A n ,Y) 

This implies that 

H(f(X n )\Y) 

= H{E n ,A n \Y) - H(E n , A n \f(X n ),Y) + H(f(X n )\E n , A n ,Y) 
= I(E n ,A n ;f(X n )\Y) + H(f(X n )\E n ,A n ,Y) 

< H(E n ,A n ) + H(f(X n )\E n ,A n ,Y) 

< H{E n ) + H(A n ) + H(f(X n )\E n , A n ,Y) 

Note that H(E n ) < 1 and H(A n ) = Slog ^ + (1 - S) log ^ ~ 5. Thus we have 

H{f{X n )\Y) <l + 5 + P e n H(f(X n )\Y,E n = l,A n ) 
+(l-P?)H(f(X n )\Y,E n = 0,A n ) 



February 1, 2008 



DRAFT 



28 



Now the term P™H(f(X n )\Y, E n = l,A n ) < P e n log N e (n, do). Note that the second term does not 
go to zero. For the second term we have that, 

(l-P?)H(f(X n )\Y,E n = 0,A n ) 

= P(A n = 1)(1 - P e n )H(f(X n )\Y,E n = 0,A n = 1) 

+P{A n = 0)(1 - P?)H(f(X n )\Y,E n = 0, A n = 0) 
< (1 - P?)H(f(X n )\Y,E n = 0,A n = 1) 

+S(1 -P«) log (JV e (n,do)) 
The first term on R.H.S in the above inequality is bounded via, 

(1 - P e n )H(f(X n )\Y, E n = 0,A n = 1) < (1 - P«) log (|5|) 
where 5 is the set given by, 

where d se t(Si,S2) = mm s& s 1 ,s , es 2 d n (s, s') is the set distance between two sets. Now note that 
I(f(X n );X n ) = H(f(X n )) and H(f(X n )\Y) = H(f(X n ))- 1 (f(X n );X n ) > H{f{X n ))-I{X n ;Y) 
where the second inequality follows from data processing inequality over the Markov chain f(X n ) <-> 
X n «-> Y. Thus we have, 

pn> I(f(X n );X n )-log\S\-I(X n ;Y)-l 
(l-<y)logJV e (n,do)-log|5| 

^(l+logiV e (n,d )) 
(1-5) logiV e (n, a !o)-log|5| 
The above inequality is true for all the mappings / satisfying the distortion criteria for mapping X n 

and for all choices of the set satisfying the covering condition given by II 1.21 We now state the following 

lemma for a minimal covering, taken from [16]. 

Lemma 11.2: Given e > and the distortion measure d n (., .), let N e (n, do) be the minimal number 

of points Zf, Z 7 ^ ( ndo j C X n satisfying the covering condition, 

(N e (n,d a ) \ 
jj Bi j > l-e 

Let N e (n,do) be the minimal such number. Then, 
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limsup —N e (n, do) = Rx(e, do) 
n n 

where Rx(e,do) is the infimum of the e- achievable rates at distortion level do- 

Note that lim e |o Rx(c, do) = Rx(do) where Rx(do) = min i x \ x \ I(X;X) subject to ±E(d(X n , X n )) < 
do- In order to lower bound P™ we choose the mapping f{X n ) to correspond to the minimal cover. Also 
w.l.o.g we choose 5 = e. We note the following. 

1) From lemma ITTTTl given e > 0, 3no(e) such that for all n > no(e), we have Pr(Tp xn ) > 1 — e. 

2) Given e > and for all (3 > 0, for the minimal cover we have from lemma [1X21 that 3 ni(/3) such 
that for all n > nxifl), N e (n, do) < n(R x (e, do) + (3). 

3) From the definition of the rate distortion function we have for the choice of the functions f(X n ) 
that satisfies the distortion criteria, I(f(X n );X n ) > nRx(e,do). 

Therefore we have for n > max(no,ni), 



P? > 



nRxje, dp) -log \S\ - I (X n ;Y) - 1 
(l-e)(n(/? x (€,do)+^)-log|5| 

e(l + n(Rx(e,do)+f3) 
{l-e)n{R x (e,do) + P)-log\S\ 

Clearly, log|5| < *R x (e,do). 

d) Limiting case: Since the choice of e, (3 is arbitrary we can choose them to be arbitrary small. 
In fact we can choose e,(3 j 0. Also note that for every e > and (3 > there exists ri2((3) such that 
Rx{do) + (3 > Rx(t,do) > Rx(do) — [3. Therefore for all n > max(no, ri\, 112) in the limiting case 
when e, (3 J. 0, we have 

> Rx(d )-±1og\S\-l i I(X»;Y) 

Rx(do) - ilog|5| K) 

This implies that 

Rx(do)-±log\S\-±I(X^Y) 
e - Rx{do) { ) 

The proof then follows by identifying K(n,do) = \ log |«S|, and is bounded above by a constant. 
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B. Proof of lemma \ 

Proof: Given an observation Y about the event X n . Define an error event, 



E = { 



lif ±d H (X n ,X n (Y))>d 
otherwise 

Expanding H(X n , E\Y) in two different ways we get that, 

H(X n \Y) < 1 + nP e \og{\X\) + (1 - P e )H{X n \E = 0,Y) 

Now the term 

(1 - P e )H{X n \E = 0,Y) 

<(i-^)G:J(i^i-ir do 

<n(l-P e )(/»(do) + dolog(|Af|-l)) 
Then we have for the lower bound on the probability of error that, 



Pe> 



H(X n \Y) - n (h(d ) + d log(\X\ - 1))) - 1 



nlog(|^|)-n(fc(do) + dolog(|^|-l)) 
Since H(X n \Y) = H(X n ) - I(X n ; Y) we have 

n (H(X) - h(d ) - dp log(\X\ - 1)) - I(X n ;Y) - 1 
nlog(|Af|)-n(/i(do) + dolog(|Af|-l)) 

It is known that R x (d ) > H(X) - h(d ) - d \og(\X\ - 1), with equality iff 

do < (\X\ - 1) minP x 
see e.g., [16]. Thus for those values of distortion we have for all n, 

> nR x (d )-I(X n ;Y)-l 
e ~ nlog(\X\) - n(h(d ) + d log(\X\ -1)) 



C. Rate distortion function for the mixture Gaussian source under squared distortion measure 

It has been shown in [22] that the rate distortion function for a mixture of two Gaussian sources with 
variances given by o-\ with mixture ratio a and oq with mixture ratio 1 — a, is given by 
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' H(a) + ^ log(§) + f log(§) if D < al 
. H{a) + f log( D _ ( ^l K ) if ol < D < (1 - a)al + aa\ 
For a strict sparsity model we have a\ — > we have that, 

Rmix{D) = H{a) + f log(^) if < D < oaf 

D. Bounds on Mutual information 

In this section we will evaluate bounds on mutual information that will be useful in characterization 
of the Sensing Capacity. Given that the matrix G is chosen independently of X we expand the mutual 
information between X and Y, G in two different ways as follows - 

J(X;Y,G) =/(X;G)+/(X;Y|G) 

=o 

= /(X;Y)+/(X;G|Y) 

This way of expanding gives us handle onto evaluating the mutual information with respect to the 
structure of the resulting sensing matrix G. From above we get that, 

/(X;Y|G) =/(X;Y)+/(X;G|Y) 

= h(Y) - h(Y\X) + h(G\Y) - h(G\X,Y) 
To this end we have the following lemma. 

Lemma 11.3: For a sparsity level of a and diversity factor of (3 = 1, 



Proof: First note that, 



Til C\ P 

I(X;Y|G)< y log(l + ^-) 



Tfl 

h(Y) < — log2vre(iVo + aP) 
Since conditioned on X, Y is distributed with a Gaussian density we have, 

h(Y\X) = ^ log 2vre (n + J 



h(G\Y)<h(G) = ™\og (iire^j 
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Note also that conditioned on X and Y the G has a Gaussian distribution. Now note that, h(G\Y, X). 
First note that, rows of G are independent of each other given X and Y. So we can write, 

h(G\Y,X) =m/i( gl |Y,X) 

where gi is the first row of the matrix G Since g is Gaussian one can find the residual entropy in terms 
of the residual MMSE error in estimation of g given X and Y. This error is given by - 



MMSE 



gi|Y,X 



S gl |X - S giY|xS Y j x S^ Y|x 



^giYxIX^Y^x^giYilX 



The second equation follows from the fact that G is independent of X and given X the row gi is 
independent of other observations, Y2, Y m . First note that given X we also know which positions of 
X are zeros. So without lossof generality we can assume that the first k elements of X are non-zeros 
and the rest are zeros. Now note the following, 



S gl — I n 



n 



S giYx|X 



where n -k is a column vector of n — k zeros. 



P 

n 



\On-kJ 



S Y 1 |x = -E X ' + iV o 



1=1 



Therefore we have, 



Mgi|Yi,X) 

= ilog(2vre) fc det (£4 - fX^E^X^) 
+^log27re| 

Note that the second term on the R.H.S in the above equation corresponds to the entropy of those 
elements of the row gi that have no correlation with Y, i.e. nothing can be inferred about these elements 
since they overlap with zero elements of X. Now, using the equation det(7 + AB) = det(7 + BA), we 
have that 
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h(g 1 \Y 1 ,X) = Ilog(2M£) fc det (l -X£ fc £-; |x f X 1:fc ) 
Plugging in all the expressions we get a lower bound on the mutual information J(X; Y|G) - 

777 r\ P 

/(X;Y|G)< y log(l + — ) 

■ 

In contrast to the upper bound derived in the proof of lemmas 15.11 and 15.21 this alternate derivation 
provides a handle to understand the effect of the structure of G on the mutual information when one is 
not allowed to pick a maximizing input distribution on X. Moreover the above derivation can potentially 
handle scenarios of correlated G. Below we will use the above result in order to prove lemma 18.11 

E. Proof of lemma \8.1\ 

To this end let / = f3n and is fixed, i.e 
have 

KG) 

Now we will first evaluate /i(G|Y,X) 

h(G|X,Y) = mh(g 1 \Y 1 ,X) + mh 2 ((3) 

where one can see that if the matrix G is chosen from a Gaussian ensemble then given X and Y it 
tells nothing about the positions of the non-zeros in each row. Hence the additive term h 2 (/?) appears in 
both terms and is thus canceled in the overall calculations. So we will omit this term in the subsequent 
calculations. To this end, let j denote the number of overlaps of the vector gi and the k-sparse vector X. 
Given Yi and X one can only infer something about those elements of G that contribute to Yi. Given 
the number of overlaps j we then have 

MgilX, Y u j) = *?log27re? + §log ((^ f g J^ +JVo ) 

where we have assumed without loss of generality that the first j elements of X are non-zero and 
overlap with elements of the first row. Now note that, 

/ l (Y|i)<|log27re(^ + iV ) 



. there are only / non-zero terms in each row of matrix G. We 

ml P 
= — log 2vre— + mh 2 ((3) 

. Proceeding as in derivation of lemma II 1.31 we have that, 
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h(Y\X,j) = — log 2vre ( y ^ X? + N 




From above we have that, 



,(X;Y|G. J ) = y log(l + ^-) 



Taking the expectation with respect to the variable j we have, 



/(X;Y|G) = y E j 



Note that j < minjfe, 1} and has a distribution given by, 



Pr(j) = 




XII. Upper bounds to Mutual information for {0, 1} ensemble 



In this section we will derive upper bounds to the mutual information 7(X; Y|G) for the case when 
the matrix is chosen from a {0, 1} ensemble. First it is easily seen that for this ensemble a full diversity 
leads to loss of rank and thus the mutual information is close to zero. So we will only consider the case 



A. Random locations of 1 's in G 

In this section we will provide simple upper bounds to the mutual information 7(X; Y|G) for the case 
of {0, 1} ensemble of sensing matrices. Note that, 



Now note that I J(X; Y) = o(l). Then we need to evaluate J(G; X| Y) < 77(G) - 77(G|Y, X). Now 
note that since each row of G is an independent Bernoulli~ (5 sequence we can split the entropy into 
sum of entropies each individual rows. To this end focus on the first row. Then conditioned on there 
being I l's in the row we have, 

H{G\\l) < ("). Given that X is /c-sparse we have, 



< 1. 



7(X;Y|G) < 7(X;GX|G) 



Let Y = GX. Then we have, 



/(X; Y|G) = /(X; Y) + /(X; G| Y) 
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min(fc,/) fk\ (n—k 



Thus we have 



min(fe,/) (k\ (n-k\ /, \ / ,\ 

7(X;G|Y,M)<(?)- £ %^log(*)(7:*)-H(J|*,0 



where J is a random variable with distribution given by, 

p r t j — j) — \jl\izil 

Pr(J-j)- (:) 

For large enough n, k = an and / = [3n w.h.p. Thus /(X; G|Y) < H(J), where J has a limiting 
distribution given by, 

/cm\ /n(l— a)\ 

In other words given e > there exists an no such that for all n > no, sup|Pj(j) — Pj(j)\ < e and by 

j 

continuity of the entropy function, [[16], pp. 33, Lemma 2.7], it follows that \H(J) — H(J)\ < — elog — 

n 

B. Contiguous sampling 

In this case for each row we have H{G\) = logn. To evaluate i?(Gi|X, Y), fix the number of ones 
in Gi to be equal to / and the number of non-zero elements in X to be equal to k. Now note that if 
Y\ = then there is no overlap in Gi and X. This means that the row of G can have contiguous ones in 
n — k — I positions equally likely. The probability of no overlap is H=£= 1. On the other hand if Y\ > 0, 
then uncertainty in locations of ones in Gi reduces to log(fc + /). The probability that Y > is 
Thus we have, 

/(Gi;X|Y) < mH(0) 

where O is a binary random variable with distribution (1 — ^jp), For large enough n this comes 
close to 1 — (a + /3),a + (3. Thus we have, 

/(G;X|Y) < mH(a + P) 
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